ctober 2
Discrete Wavelet Transform as a Facilitator for Expressive Latent Space Representation in Variational Autoencoders in Satellite Imagery
Mahara, Arpan, Khan, Md Rezaul Karim, Rishe, Naphtali, Wang, Wenjia, Sadjadi, Seyed Masoud
Latent Diffusion Models (LDM), a subclass of diffusion models, mitigate the computational complexity of pixel-space diffusion by operating within a compressed latent space constructed by Variational Autoencoders (VAEs), demonstrating significant advantages in Remote Sensing (RS) applications. Though numerous studies enhancing LDMs have been conducted, investigations explicitly targeting improvements within the intrinsic latent space remain scarce. This paper proposes an innovative perspective, utilizing the Discrete Wavelet Transform (DWT) to enhance the VAE's latent space representation, designed for satellite imagery. The proposed method, ExpDWT-VAE, introduces dual branches: one processes spatial domain input through convolutional operations, while the other extracts and processes frequency-domain features via 2D Haar wavelet decomposition, convolutional operation, and inverse DWT reconstruction. These branches merge to create an integrated spatial-frequency representation, further refined through convolutional and diagonal Gaussian mapping into a robust latent representation. We utilize a new satellite imagery dataset housed by the TerraFly mapping system to validate our method. Experimental results across several performance metrics highlight the efficacy of the proposed method at enhancing latent space representation.
AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis
Khodorchenko, Maria, Butakov, Nikolay, Zuev, Maxim, Nasonov, Denis
Topic modeling is a well-known technique for modeling the internal structure of a text corpora, represented as a set of interrelated word sets known as topics. Starting from Latent Semantic Allocation (LSA) [1] and Non-negative Matrix Factorization (NMF) [2] to probabilistic and neural approach, topic modeling proved to be a valuable tool to solve a range of practical tasks [3, 4]. One of the key features of topic modeling lays in the interpretability of resulting representations, that enables easier comprehension of complex datasets and helps in meaningful insights extraction. To be useful, topic models should be flexible enough to model various corpora of different nature, origin, and language. Which requires the model to be carefully tuned for the corpora in consideration at the moment, and usually is closely connected with the amount of hyperparameters the model has. This is especially true for additively regularized topic models that represent semi-probabilistic group of methods revealing great adaptability, but requiring setting a high number of parameters and expertise to do that properly. This paper presents AutoTM 2.0 framework that allow effective usage of additively regularized models, as they provide the most flexible way to process datasets with different statistical characteristics. Our main contributions can be summarized as follows: significant simplification of the use of flexible additively regularized models by offering automatic singleobjective optimization procedures. Offering metrics that closely align with human judgment.
Easydiagnos: a framework for accurate feature selection for automatic diagnosis in smart healthcare
Maji, Prasenjit, Mondal, Amit Kumar, Mondal, Hemanta Kumar, Mohanty, Saraju P.
The rapid advancements in artificial intelligence (AI) have revolutionized smart healthcare, driving innovations in wearable technologies, continuous monitoring devices, and intelligent diagnostic systems. However, security, explainability, robustness, and performance optimization challenges remain critical barriers to widespread adoption in clinical environments. This research presents an innovative algorithmic method using the Adaptive Feature Evaluator (AFE) algorithm to improve feature selection in healthcare datasets and overcome problems. AFE integrating Genetic Algorithms (GA), Explainable Artificial Intelligence (XAI), and Permutation Combination Techniques (PCT), the algorithm optimizes Clinical Decision Support Systems (CDSS), thereby enhancing predictive accuracy and interpretability. The proposed method is validated across three diverse healthcare datasets using six distinct machine learning algorithms, demonstrating its robustness and superiority over conventional feature selection techniques. The results underscore the transformative potential of AFE in smart healthcare, enabling personalized and transparent patient care. Notably, the AFE algorithm, when combined with a Multi-layer Perceptron (MLP), achieved an accuracy of up to 98.5%, highlighting its capability to improve clinical decision-making processes in real-world healthcare applications.
A SSM is Polymerized from Multivariate Time Series
State space models (SSMs) [1] [15] are subquadratic-time foundational architectures compared with Transformers [2], and shows great performance with approximately linear complexity on long-range dependency tasks. Previous studies [3] [4] [5] attempted to employ SSM for Multivariate Time Series Forecasting (MTSF), they all follow the Transformer-based MTSF modeling paradigm: learning dependencies between temporal tokens [6] [7] [8] [9], Channel tokens [10] and their concatenation [11]. However, the special complex dependency pattern of MTS is the Channel Dependency variations with Time (CDT), none of these methods explicitly depict it. It is inappropriate to directly model the CDT because it not only greatly increases complexity when calculating the dependency between temporal tokens of all channels but is also hard to generalize for the scale of most MTS data. We delved deep into the initial development of SSM [12]: real-time approximation of continuously updating function by orthogonal function basis [13], and we found that compared with Transformers, SSM has the potential to efficiently and effectively depict the CDT pattern.
Survey of Security and Data Attacks on Machine Unlearning In Financial and E-Commerce
Machine learning in financial and e-commerce sector employs vast amounts of data are used to predict trends, detect fraud, and optimize decision-making processes. However, as these models become more widespread, concerns over security and privacy have also increased. In response to such challenges, machine unlearning has been introduced as a solution to enable models to forget specific data points when necessary, particularly for compliance with data regulations like the General Data Protection Regulation (GDPR). While machine unlearning provides an avenue for users to request the deletion of data from ML models, it also introduces new vulnerabilities to both privacy and security. Privacy and security attacks on machine unlearning are growing areas of concern, especially in sensitive financial applications where personal data is paramount. Two main categories of attacks can exploit this process: privacy attacks and security attacks. Privacy attacks target the confidentiality of data by attempting to reveal sensitive information, whereas security attacks aim to compromise the integrity and functionality of the machine unlearning process. In this paper, we aim to survey the types of privacy and security data attacks specific to machine unlearning in financial applications.
How many words does ChatGPT know? The answer is ChatWords
Martínez, Gonzalo, Conde, Javier, Reviriego, Pedro, Merino-Gómez, Elena, Hernández, José Alberto, Lombardi, Fabrizio
The introduction of ChatGPT has put Artificial Intelligence (AI) Natural Language Processing (NLP) in the spotlight. ChatGPT adoption has been exponential with millions of users experimenting with it in a myriad of tasks and application domains with impressive results. However, ChatGPT has limitations and suffers hallucinations, for example producing answers that look plausible but they are completely wrong. Evaluating the performance of ChatGPT and similar AI tools is a complex issue that is being explored from different perspectives. In this work, we contribute to those efforts with ChatWords, an automated test system, to evaluate ChatGPT knowledge of an arbitrary set of words. ChatWords is designed to be extensible, easy to use, and adaptable to evaluate also other NLP AI tools. ChatWords is publicly available and its main goal is to facilitate research on the lexical knowledge of AI tools. The benefits of ChatWords are illustrated with two case studies: evaluating the knowledge that ChatGPT has of the Spanish lexicon (taken from the official dictionary of the "Real Academia Espa\~nola") and of the words that appear in the Quixote, the well-known novel written by Miguel de Cervantes. The results show that ChatGPT is only able to recognize approximately 80% of the words in the dictionary and 90% of the words in the Quixote, in some cases with an incorrect meaning. The implications of the lexical knowledge of NLP AI tools and potential applications of ChatWords are also discussed providing directions for further work on the study of the lexical knowledge of AI tools.
A survey on natural language processing (nlp) and applications in insurance
Ly, Antoine, Uthayasooriyar, Benno, Wang, Tingting
Text is the most widely used means of communication today. This data is abundant but nevertheless complex to exploit within algorithms. For years, scientists have been trying to implement different techniques that enable computers to replicate some mechanisms of human reading. During the past five years, research disrupted the capacity of the algorithms to unleash the value of text data. It brings today, many opportunities for the insurance industry.Understanding those methods and, above all, knowing how to apply them is a major challenge and key to unleash the value of text data that have been stored for many years. Processing language with computer brings many new opportunities especially in the insurance sector where reports are central in the information used by insurers. SCOR's Data Analytics team has been working on the implementation of innovative tools or products that enable the use of the latest research on text analysis. Understanding text mining techniques in insurance enhances the monitoring of the underwritten risks and many processes that finally benefit policyholders.This article proposes to explain opportunities that Natural Language Processing (NLP) are providing to insurance. It details different methods used today in practice traces back the story of them. We also illustrate the implementation of certain methods using open source libraries and python codes that we have developed to facilitate the use of these techniques.After giving a general overview on the evolution of text mining during the past few years,we share about how to conduct a full study with text mining and share some examples to serve those models into insurance products or services. Finally, we explained in more details every step that composes a Natural Language Processing study to ensure the reader can have a deep understanding on the implementation.